A modular tool to aggregate results from bioinformatics analyses across many samples into a single report.
Report
generated on 2025-05-20, 03:50 UTC
based on data in:
/home/runner/work/pmultiqc/pmultiqc/data
pmultiqc
pmultiqc is a MultiQC module to show the pipeline performance of mass spectrometry based quantification pipelines such as nf-core/quantms, MaxQuant.URL: https://github.com/bigbio/pmultiqc
Parameters
MaxQuant parameters, extracted from parameters.txt, summarizes the settings used for the MaxQuant analysis. Key parameters are MaxQuant version, Re-quantify, Match-between-runs and mass search tolerances. A list of protein database files is also provided, allowing to track database completeness and database version information (if given in the filename).
| No. | Parameter | Value |
|---|---|---|
| 1 | Version | 1.5.2.8 |
| 2 | User name | cbielow |
| 3 | Machine name | CD02-WIN7 |
| 4 | Date of writing | 08/05/2015 11:38:59 |
| 5 | Fixed modifications | Carbamidomethyl (C) |
| 6 | Decoy mode | revert |
| 7 | Special AAs | KR |
| 8 | Include contaminants | True |
| 9 | MS/MS tol. (FTMS) | 20 ppm |
| 10 | Top MS/MS peaks per 100 Da. (FTMS) | 12 |
| 11 | MS/MS deisotoping (FTMS) | True |
| 12 | MS/MS tol. (ITMS) | 0.5 Da |
| 13 | Top MS/MS peaks per 100 Da. (ITMS) | 8 |
| 14 | MS/MS deisotoping (ITMS) | False |
| 15 | MS/MS tol. (TOF) | 40 ppm |
| 16 | Top MS/MS peaks per 100 Da. (TOF) | 10 |
| 17 | MS/MS deisotoping (TOF) | True |
| 18 | MS/MS tol. (Unknown) | 0.5 Da |
| 19 | Top MS/MS peaks per 100 Da. (Unknown) | 8 |
| 20 | MS/MS deisotoping (Unknown) | False |
| 21 | PSM FDR | 0.0 |
| 22 | Protein FDR | 0.0 |
| 23 | Site FDR | 0.0 |
| 24 | Use Normalized Ratios For Occupancy | True |
| 25 | Min. peptide Length | 7 |
| 26 | Min. score for unmodified peptides | 0 |
| 27 | Min. score for modified peptides | 40 |
| 28 | Min. delta score for unmodified peptides | 0 |
| 29 | Min. delta score for modified peptides | 6 |
| 30 | Min. unique peptides | 0 |
| 31 | Min. razor peptides | 1 |
| 32 | Min. peptides | 1 |
| 33 | Use only unmodified peptides and | True |
| 34 | Modifications included in protein quantification | Acetyl (Protein N-term);Oxidation (M) |
| 35 | Peptides used for protein quantification | Razor |
| 36 | Discard unmodified counterpart peptides | True |
| 37 | Min. ratio count | 2 |
| 38 | Re-quantify | False |
| 39 | Use delta score | False |
| 40 | iBAQ | False |
| 41 | iBAQ log fit | False |
| 42 | Match between runs | True |
| 43 | Matching time window [min] | 0.7 |
| 44 | Alignment time window [min] | 20 |
| 45 | Find dependent peptides | False |
| 46 | Fasta file | crap_withMycoplasma.fasta;uniprot_human_canonical_and_isoforms_20130513.fasta |
| 47 | Labeled amino acid filtering | True |
| 48 | Site tables | Oxidation (M)Sites.txt |
| 49 | RT shift | False |
| 50 | Advanced ratios | True |
| 51 | First pass AIF correlation | 0.8 |
HeatMap
This heatmap provides an overview of the performance of the MaxQuant results.
This plot shows the pipeline performance overview. Some metrics are calculated.
- Heatmap score[Contaminants]: as fraction of summed intensity with 0 = sample full of contaminants; 1 = no contaminants
- Heatmap score[Pep Intensity (>23.0)]: Linear scale of the median intensity reaching the threshold, i.e. reaching 2^21 of 2^23 gives score 0.25.
- Heatmap score[Charge]: Deviation of the charge 2 proportion from a representative Raw file (median). For typtic digests, peptides of charge 2 (one N-terminal and one at tryptic C-terminal R or K residue) should be dominant. Ionization issues (voltage?), in-source fragmentation, missed cleavages and buffer irregularities can cause a shift (see Bittremieux 2017, DOI: 10.1002/mas.21544).
- Heatmap score [Missed Cleavages]: the fraction (0% - 100%) of fully cleaved peptides per Raw file
- Heatmap score [Missed Cleavages Var]: each Raw file is scored for its deviation from the ‘average’ digestion state of the current study.
- Heatmap score [ID rate over RT]: Judge column occupancy over retention time. Ideally, the LC gradient is chosen such that the number of identifications (here, after FDR filtering) is uniform over time, to ensure consistent instrument duty cycles. Sharp peaks and uneven distribution of identifications over time indicate potential for LC gradient optimization.Scored using ‘Uniform’ scoring function. i.e. constant receives good score, extreme shapes are bad.
- Heatmap score [MS2 Oversampling]: The percentage of non-oversampled 3D-peaks. An oversampled 3D-peak is defined as a peak whose peptide ion (same sequence and same charge state) was identified by at least two distinct MS2 spectra in the same Raw file. For high complexity samples, oversampling of individual 3D-peaks automatically leads to undersampling or even omission of other 3D-peaks, reducing the number of identified peptides.
- Heatmap score [Pep Missing Values]: Linear scale of the fraction of missing peptides.
Intensity Distribution
Intensity boxplots by experimental groups. Groups are user-defined during MaxQuant configuration. This plot displays a (customizable) threshold line for the desired mean intensity of proteins. Groups which underperform here, are likely to also suffer from a worse MS/MS id rate and higher contamination due to the lack of total protein loaded/detected. If possible, all groups should show a high and consistent amount of total protein.
The height of the bar correlates to the number of proteins with non-zero abundance.
LFQ Intensity Distribution
Label-free quantification (LFQ) intensity boxplots by experimental groups.
Label-free quantification (LFQ) intensity boxplots by experimental groups. Groups are user-defined during MaxQuant configuration. This plot displays a (customizable) threshold line for the desired mean of LFQ intensity of proteins. Raw files which underperform in Raw intensity, are likely to show an increased mean here, since only high-abundance proteins are recovered and quantifyable by MaxQuant in this Raw file. The remaining proteins are likely to receive an LFQ value of 0 (i.e. do not contribute to the distribution).
The height of the bar correlates to the number of proteins with non-zero abundance.
PCA of Raw Intensity
[Excludes Contaminants] Principal components plots of experimental groups (as defined during MaxQuant configuration).
This plot is shown only if more than one experimental group was defined. If LFQ was activated in MaxQuant, an additional PCA plot for LFQ intensities is shown. Similarly, if iTRAQ/TMT reporter intensities are detected. Since experimental groups and Raw files do not necessarily correspond 1:1, this plot cannot use the abbreviated Raw file names, but instead must rely on automatic shortening of group names.
PCA of LFQ Intensity
[Excludes Contaminants] Principal components plots of experimental groups (as defined during MaxQuant configuration).
This plot is shown only if more than one experimental group was defined. If LFQ was activated in MaxQuant, an additional PCA plot for LFQ intensities is shown. Similarly, if iTRAQ/TMT reporter intensities are detected. Since experimental groups and Raw files do not necessarily correspond 1:1, this plot cannot use the abbreviated Raw file names, but instead must rely on automatic shortening of group names.
MS/MS Identified per Raw File
MS/MS identification rate per Raw file from summary.txt.
TODO: add description here @Yasset
Peptide Intensity Distribution
Peptide precursor intensity per Raw file from evidence.txt WITHOUT match-between-runs evidence.
Peptide precursor intensity per Raw file from evidence.txt WITHOUT match-between-runs evidence. Low peptide intensity usually goes hand in hand with low MS/MS identifcation rates and unfavourable signal/noise ratios, which makes signal detection harder. Also instrument acquisition time increases for trapping instruments. Failing to reach the intensity threshold is usually due to unfavorable column conditions, inadequate column loading or ionization issues. If the study is not a dilution series or pulsed SILAC experiment, we would expect every condition to have about the same median log-intensity (of 2%1.1f). The relative standard deviation (RSD) gives an indication about reproducibility across files and should be below 5%%.
Potential Contaminants per Group
Potential contaminants per group from proteinGroups.txt.
External protein contamination should be controlled for, therefore MaxQuant ships with a comprehensive, yet customizable protein contamination database, which is searched by MaxQuant by default. A contamination plot derived from the proteinGroups (PG) table, showing the fraction of total protein intensity attributable to contaminants.
Note that this plot is based on experimental groups, and therefore may not correspond 1:1 to Raw files.
Top5 Contaminants per Raw file
The five most abundant external protein contaminants by Raw file
pmultiqc will explicitly show the five most abundant external protein contaminants (as detected via MaxQuant's contaminants FASTA file) by Raw file, and summarize the remaining contaminants as 'other'. This allows to track down which proteins exactly contaminate your sample. Low contamination is obviously better.
If you see less than 5 contaminants, it either means there are actually less, or that one (or more) of the shortened contaminant names subsume multiple of the top5 contaminants (since they start with the same prefix).
Charge-state of per Raw file
The distribution of the charge-state of the precursor ion, excluding potential contaminants.
The distribution of the charge-state of the precursor ion, excluding potential contaminants.
Modifications per Raw file
Compute an occurence table of modifications (e.g. Oxidation (M)) for all peptides, including the unmodified.
Post-translational modifications contained within the identified peptide sequence.
Peptide ID Count
[Excludes Contaminants] Number of unique (i.e. not counted twice) peptide sequences including modifications (after FDR) per Raw file.
If MBR was enabled, three categories ('Genuine (Exclusive)', 'Genuine + Transferred', 'Transferred (Exclusive)' are shown, so the user can judge the gain that MBR provides. Peptides in the 'Genuine + Transferred' category were identified within the Raw file by MS/MS, but at the same time also transferred to this Raw file using MBR. This ID transfer can be correct (e.g. in case of different charge states), or incorrect -- see MBR-related metrics to tell the difference. Ideally, the 'Genuine + Transferred' category should be rather small, the other two should be large.
If MBR would be switched off, you can expect to see the number of peptides corresponding to 'Genuine (Exclusive)' + 'Genuine + Transferred'. In general, if the MBR gain is low and the MBR scores are bad (see the two MBR-related metrics), MBR should be switched off for the Raw files which are affected (could be a few or all).
ProteinGroups Count
[Excludes Contaminants] Number of Protein groups (after FDR) per Raw file.
If MBR was enabled, three categories ('Genuine (Exclusive)', 'Genuine + Transferred', 'Transferred (Exclusive)' are shown, so the user can judge the gain that MBR provides. Here, 'Transferred (Exclusive)' means that this protein group has peptide evidence which originates only from transferred peptide IDs. The quantification is (of course) always from the local Raw file. Proteins in the 'Genuine + Transferred' category have peptide evidence from within the Raw file by MS/MS, but at the same time also peptide IDs transferred to this Raw file using MBR were used. It is not unusual to see the 'Genuine + Transferred' category be the rather large, since a protein group usually has peptide evidence from both sources. To see of MBR worked, it is better to look at the two MBR-related metrics.
If MBR would be switched off, you can expect to see the number of protein groups corresponding to 'Genuine (Exclusive)' + 'Genuine + Transferred'. In general, if the MBR gain is low and the MBR scores are bad (see the two MBR-related metrics), MBR should be switched off for the Raw files which are affected (could be a few or all).
Oversampling Distribution
An oversampled 3D-peak is defined as a peak whose peptide ion (same sequence and same charge state) was identified by at least two distinct MS2 spectra in the same Raw file.
For high complexity samples, oversampling of individual 3D-peaks automatically leads to undersampling or even omission of other 3D-peaks, reducing the number of identified peptides. Oversampling occurs in low-complexity samples or long LC gradients, as well as undersized dynamic exclusion windows for data independent acquisitions.
Missed Cleavages per Raw file
[Excludes Contaminants] Missed Cleavages per Raw file.
Under optimal digestion conditions (high enzyme grade etc.), only few missed cleavages (MC) are expected. In general, increased MC counts also increase the number of peptide signals, thus cluttering the available space and potentially provoking overlapping peptide signals, biasing peptide quantification. Thus, low MC counts should be favored. Interestingly, it has been shown recently that incorporation of peptides with missed cleavages does not negatively influence protein quantification (see Chiva, C., Ortega, M., and Sabido, E. Influence of the Digestion Technique, Protease, and Missed Cleavage Peptides in Protein Quantitation. J. Proteome Res. 2014, 13, 3979-86 ). However this is true only if all samples show the same degree of digestion. High missed cleavage values can indicate for example, either a) failed digestion, b) a high (post-digestion) protein contamination, or c) a sample with high amounts of unspecifically degraded peptides which are not digested by trypsin.
If MC>=1 is high (>20%) you should increase the missed cleavages settings in MaxQuant and compare the number of peptides. Usually high MC correlates with bad identification rates, since many spectra cannot be matched to the forward database.
In the rare case that 'no enzyme' was specified in MaxQuant, neither scores nor plots are shown.
IDs over RT
Distribution of retention time, derived from the evidence table.
The uncalibrated retention time in minutes in the elution profile of the precursor ion, and does not include potential contaminants.
Peak width over RT
Distribution of widths of peptide elution peaks, derived from the evidence table.
The distribution of the widths of peptide elution peaks, derived from the evidence table and excluding potential contaminants, is one parameter of optimal and reproducible chromatographic separation.
Uncalibrated Mass Error
[Excludes Contaminants] Mass accurary before calibration.
Mass error of the uncalibrated mass-over-charge value of the precursor ion in comparison to the predicted monoisotopic mass of the identified peptide sequence.
Calibrated Mass Error
[Excludes Contaminants] Mass accuracy after calibration.
Mass error of the recalibrated mass-over-charge value of the precursor ion in comparison to the predicted monoisotopic mass of the identified peptide sequence in parts per million.
TopN
This metric somewhat summarizes "TopN over RT"
Reaching TopN on a regular basis indicates that all sections of the LC gradient deliver a sufficient number of peptides to keep the instrument busy. This metric somewhat summarizes "TopN over RT".
TopN over RT
TopN over retention time.
TopN over retention time. Similar to ID over RT, this metric reflects the complexity of the sample at any point in time. Ideally complexity should be made roughly equal (constant) by choosing a proper (non-linear) LC gradient. See Moruz 2014, DOI: 10.1002/pmic.201400036 for details.
Ion Injection Time over RT
Ion injection time score - should be as low as possible to allow fast cycles. Correlated with peptide intensity. Note that this threshold needs customization depending on the instrument used (e.g., ITMS vs. FTMS).